Search CORE

4,012 research outputs found

Extending Dependencies with Conditions

Author: Bravo Loreto
Fan Wenfei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Detecting Inconsistencies in Distributed Data

Author: Fan Wenfei
Geerts Floris
Ma Shuai
Mueller Heiko
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

Reasoning about Record Matching Rules

Author: Fan Wenfei
Jia Xibei
Li Jianzhong
Ma Shuai
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

Graph Homomorphism Revisited for Graph Matching

Author: Fan Wenfei
Li Jianzhong
Ma Shuai
Wang Hongzhi
Wu Yinghui
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

Capturing Topology in Graph Pattern Matching

Author: Cao Yang
Fan Wenfei
Huai Jinpeng
Ma Shuai
Wo Tianyu
Publication venue
Publication date: 01/01/2011
Field of study

Graph pattern matching is often defined in terms of subgraph isomorphism, an NP-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubic-time. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubic-time algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using real-life data and synthetic data.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

Edinburgh Research Explorer

Towards Certain Fixes with Editing Rules and Master Data

Author: Fan Wenfei
Li Jianzhong
Ma Shuai
Tang Nan
Yu Wenyuan
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

Improving Data Quality: Consistency and Accuracy

Author: Cong Gao
Fan Wenfei
Geerts Floris
Jia Xibei
Ma Shuai
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Institutional Repository Universiteit Antwerpen

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

Author: Liu Shujie
Ma Shuai
Ren Shuo
Zhang Zhirui
Zhou Ming
Publication venue
Publication date: 13/01/2019
Field of study

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.Comment: To be presented at AAAI 2019; 9 pages, 4 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications